也称火车采集器,LocoySpider,一款专业的互联网数据抓取、处理、分析,挖掘软件,可以灵活迅速地抓取网页上散乱分布的数据信息,并通过一系列的分析处理,准确挖掘出所需数据。火车采集器历经十二年的升级更新,积累了大量用户和良好口碑,是目前最受欢迎的网页数据采集软件。
科学软件资源导航
Scientific software resource navigation
标签: #Web crawler
网页采集软件,数据采集,适合产品、运营、销售、数据分析、政府机关、电商从业者、学术研究等
Nutch is a well matured, production ready Web crawler. Nutch 1.x enables fine grained configuration, relying on?Apache Hadoop??data structures, which are great for batch processing.
Heritrix is a web crawler designed for web archiving. It was written by the Internet Archive. It is available under a free software license and written in Java. The main interface is accessible using